Texts in, meaning out: neural language models in semantic similarity task for Russian
نویسندگان
چکیده
Distributed vector representations for natural language vocabulary get a lot of attention in contemporary computational linguistics. This paper summarizes the experience of applying neural network language models to the task of calculating semantic similarity for Russian. The experiments were performed in the course of Russian Semantic Similarity Evaluation track, where our models took from 2nd to 5th position, depending on the task. We introduce the tools and corpora used, comment on the nature of the evaluation track and describe the achieved results. It was found out that Continuous Skip-gram and Continuous Bag-of-words models, previously successfully applied to English material, can be used for semantic modeling of Russian as well. Moreover, we show that texts in Russian National Corpus (RNC) provide an excellent training material for such models, outperforming other, much larger corpora. It is especially true for semantic relatedness tasks (although stacking models trained on larger corpora on top of RNC models improves performance even more). High-quality semantic vectors learned in such a way can be used in a variety of linguistic tasks and promise an exciting field for further study.
منابع مشابه
The evolution of the meaning of the word nurse based on the classical texts of Persian literature
Background and Aim: The semantic evolution of a word over time is inevitable, indicating a social, political, religious or cultural process. Nurse is one of the words that has a significant presence in Persian literature texts and has been used in many different meanings such as slave, servan, maid, devotee, obedient, patient and preserver. The purpose of this study is to show its semantic ev...
متن کاملImpersonal Russian Sentences with the Subject in the Accusative Case and the Meaning of a Person\'s Physical Condition in the Terms of Persian Language
In this article, considering impersonal sentences with the subject in the accusative case, which conveys the physical state of a living being, an attempt is made to compare them with the Persian correlates. This type of impersonal sentences can cause different problems for the Persian-speaking students due to their grammatical specificity (e.g. the uses of the subject in the accusative, rather ...
متن کاملFuzzy-Semantic Similarity for Automatic Multilingual Plagiarism Detection
A word may have multiple meanings or senses, it could be modeled by considering that words in a sentence have a fuzzy set that contains words with similar meaning, which make detecting plagiarism a hard task especially when dealing with semantic meaning, and even harder for cross language plagiarism detection. Arabic is known by its richness, word’s constructions and meanings diversity, hence c...
متن کاملRUSSE: The First Workshop on Russian Semantic Similarity
The paper gives an overview of the Russian Semantic Similarity Evaluation (RUSSE) shared task held in conjunction with the Dialogue 2015 conference. There exist a lot of comparative studies on semantic similarity, yet no analysis of such measures was ever performed for the Russian language. Exploring this problem for the Russian language is even more interesting, because this language has featu...
متن کاملGibberish Semantics: How Good is Russian Twitter in Word Semantic Similarity Task?
The most studied and most successful language models were developed and evaluated mainly for English and other close European languages, such as French, German, etc. It is important to study applicability of these models to other languages. The use of vector space models for Russian was recently studied for multiple corpora, such as Wikipedia, RuWac, lib.ru. These models were evaluated against ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- CoRR
دوره abs/1504.08183 شماره
صفحات -
تاریخ انتشار 2015